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Target Mining 




Interface 
















atica 



Select Your Query Sequence 



• Enter PDB accession number (e.g. 1 QMA): 



OR 



|1 Ifa and chain (e.g. B): 



la 



Enter one Swiss- Prot accession (e.g. P27504) or GenBankproteinID (e.g. CAB08761.1): J 



Select Database 



Release: 



DEVF9 = BPD3 □ 



Apply Filters 



• Iteration Filter. PSI- BLAST matches to be excluded: 



Matches detected during the first 20 forward iterations □ 



If you select e.g. "Matches detected during the first 3 iterations- these matches will be excluded from the report 
(using the first_PB_iter annotation). This allows you to focus on more remote homologous which have been 
detected after 4 or more PSI-BLAST iterations. Matches detected using PSI-BLAST with negative iterations or 
using Genome-Threader are not effected by this option. However, if one match is found during the first e.g. 3 
PSI-BLAST iterations and by Genome-Threader it will be excluded. 

• Filter for the following SPECIES: 



| □Homo sapiens 



I DRattus norvecicus (Rat) 



□Mus musculus (Mouse) 



□Danio rerio (Zebra fish) 



5 



FIG. 2A 
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j http: //London-bridge. inpharmatica.co.uk/cgi-binA/olker/getTargetBPD3.pl 



2) 84 additional hits identified by both, Genome Threader and PSI-BLAST: 

Combined Gtnomt Un&Atr and PSI - Blait outfit : KI - BLAST v&Juti ait ifcown in maroon! 




Target ran. 
(GT,PSI) 



1st Best Bt 
Conf.(GT) iter. Iter E-\ 
(PSI> (PSI) (P 



I 

i 



AAA59544.1 
drill through 
i TopSOBlastHits 



iTopSOBlastHits 



Q99715 
drill through 
Top50E8a3tHit5 



Red.Seq.View 



AAB24821.1 



drill through 



Red.Seq.View 



Red.5eq.Mew 



P207Q! 
drill through 



ITopSOBlastHits 



Red.Seq.View 



AAC31672) 



driB through 



fop5oa»tKb 



Red.Seq.View 



CAA72402.1 



driB through 



iTooSOBtotHits 



i Red.SeqAfew 



i AAB38702.1 
I driB through 
-l i TopSOBtotNts 



I Red.Seq.View 



| CAB70853.1 
! drill through 
J iTboSOBtastHits 



j Red.Seq.View 



I CAA27972.1 
! drill through 
i TooSOBIastHits 



i Red.Seq.View 



; A*B59512.1 
j drill through 
J lTop50BaotHit9 



l Red.Seg.Vtew 



AAA59544.1 



AAB24S2U 



Q99715 



P20701 



AAC31672.1 



ICAA72402.1 



AAB38702.1 



CAB70853.1 



CAA27972 I 



AAB59312.1 



Not given 



leukocyte integrin alpha chain 



COLLAGEN ALPHA l(XII) i 
CHAIN PRECURSOR. j 



Homo 
sapiens 



Homo 
sapiens 



PRI 



133.7%. 33% 
kjnmaskedSW 



PRI 



4-183, ! 150-336, 
4-183 j 150-336 



33.7%. 33% j 4-183, I 150-336, 
kjnmaskedSW i 4-183 j 150-336 



PRI I 



128.8%. 26% j 4-181, I 140-318, 



LEUKOCYTE ADHESION 

GLYCOPROTEIN LPA-1 ALPHA I 

CHAIN PRECURSOR ! Homo 

(LEUKOCYTE FUNCTION j sapiens 

ASSOCIATED MOLECULE t, ! (Human). 

ALPHA CHAIN)(CD1 1 A) i 
(INTEGRIN ALPHA-L). 



leukocyte function-associated \ Homo 
motecute-1 alpha subunit I sapiens 



jurniaskedSV/ j 2-174 ;2321-2495 



1100%. 100% 



PRI j 



! 1-183, ! 153-335, 
M-183 ! 153-335 



PRI f 



39.5%.99% I 1-183, 



junmaskedSW i 1-183 



collagen type XIV 



sapiens 



cartilage matrix protein 



Homo 
| sapiens 



PRI 



129.1%. 29% 
lunmaskedSW 



hypothetical protein 



sapiens 



Not given 



Homo 
sapiens 



Not given 



1 1 

Homo 

sapiens 



pr, [317%, 27% 
™ kjnmaskedSW 



p | 28%, 28% 
rm lunmaskedSW 



PRI 



120,5%, 20% 
ktmaskedSW 



PRI 



505%, 20% 
kjnmaskedSW 



2-180, 
2-180 



4-183, 
2-182 



1-180, 
1-183 



3-183, 
2-181 



3-183, 
2-181 



153-335, 
153-335 



5-185, 
5-185 



275-455, 
39-223 



437-620, 
437-624 



1497-1673, 
1689-1873 



758-934, 
950-1134 



449 



100% 
l unmaskedGT 



449 



440 



100% 
junmaskedCT 



100% 
lunmaskedGT 



423 



j 100% 
lunmaskedGT 



423 ! 



100% 



422 



l unmaskedGT 



100% 
•unmasked GT 



1 i 2 



I 1 



413 



100% 
MimaskedGT 



406 



405 



405 



100% 
unmaskedGI 



100% 
unmaskedGT 



100% 
unmaskedGT 



2 ! 2E !§ 



2 |2EP 



I CAA07569.1 
I drill through 
J ITooSOBIastHits 



CAA07569.1 



matriHn-4 



Homo 
sapiens 



pri £81%. 25% 
m kjnmaskedSW 



1-183, 
1-183 



342-528, 
31-217 



403 ! 



100% 



lunmaskedGT 



4/40 




CD 
u_ 



5/40 




6/40 

FIG. 4 



•£fe.'v£^3^'|W- So Communicator'. 



\3 



^ For war .J^ RebaH ^ . Home. ' . .Se atch *|j&N esca pe ^ Print _ Sec^rty ^ S, 



Ihttp: //www. sanger. ac. uk/cgi-bin/Pfam/nph-search.cgi 




c The Pfam 

C>ailger Protein families database of alignments and HMMs 

Gil I FG Home I Keyword search | Protein search | DNA search | Browse Pfam | Taxonomy search | Help 




Results for g>|1788084|g|>|AAC74S54.1| 

There wore no matches to Pf am-A (including borderline matches) for gill 788084 1 gblAAC748S4 .1 1 

Matches to Pfam-B 

Domain jstart iEnid I Evahte Alignment j 



j Pfam-B 39416 1233 [423 j3.7e-103 iMga 



Alignments of Pfam-B domains to best-matching Pfam-B sequence 



Format for fetching alignments to Pfam-B families Hypertext linked tO SWiSSpfam □ 



Query gill788084lgblAAC74854.il/233-423 matching Pfam-B 39416 

YEAMJBC0LI 233 DLRYKH YEKRPDP S S Q AVMF CLMD V S G SMD Q STKBM AKRF YILL YLFL SR 282 
DLRYKH YEKRPDP S 5 QAVMF CLMDV SO SMD Q STKDM AKRFYILLYLFL SR 
gi|1788084|gb|AAC74854.1| 233 DIJ^Yl^El^DPSSQAVMPCIJ4IWSGSMDQSTi™i«RFYILLYLPLSR 282 

YEAM_EC0LI 283 TYKHVEVVYIRHHT QAKZVDEHEFF Y S QET G 6TI V S S ALKLMDEVVKERY 332 
TYKHVEVVYIRHHT Q AKE VDEHEFF Y SQETGGTIVSSAL KLMDEVVKERY 
gi|1788084|gb|AAC74854.1| 283 TYKHVE W YIRHHT Q AKE VDEHEFF Y S QET G GTIV S S ALKLMDEVVKERY 332 

YEAM_EC0LI 333 HP AQWHI YAAQASD GDNWADD SPL CMEILAKKLLPVVRYY S YIEITRRAH 382 
HP AQWHIY AAQ ASD GDHWADD SPL CHEILAKKLLPVVRYY S YXEXTRRAH 
gi| 1788084| gb | AAC 74854. 1| 333 HP AQWHIYAAQASD GDHWADD 5PL CMEILAKKLLPVVRYY SYIEITRRAK 382 

YE AH_E C 0LI 383 QTLWREYEKLQSTFDHFAMQKIRDQDDIYPVFRELFMKQHA 423 
QTLWREYEML Q STFDHF AM QKTRD QDDI YP VFRELFHKQHA 
gi| 1788084| gb |AAC 74854. 1| 383 QTLWREYEML Q STFDHF AM QKIRD QDDIYPVFRELFHKQHA 423 



Align to family 



If you think there is anything wrong with this script, please contact Pfam 



FIG. 5 
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LOCVS 

DEFIHITI OH 

ACCESSION 

FID 

VERSION 
DB SOURCE 
KEYWORDS 
SOURCE 

ORGANISM 



REFERENCE 
AUTHORS 



TITLE 
JOURNAL 
MEDLINE 
PUBMED 
REFERENCE 
AUTHORS 
TITLE 
JOURNAL 



REFERENCE 
AUTHORS 
TITLE 
JOURNAL 



REFERENCE 
AUTHORS 
TITLE 
JOURNAL 

COMMENT 



Jrtlp7Araw.ncbLnlm.ra 

AAC74854 427 aa BCT 01-DEC-2UUU 

orf, hypothetical protein (Escherichia coli xJ2). 
AAC74 854 
g!788084 

AAC74854. 1 GI : 1788084 

locus AE000273 accession AE000273 . 1 

Escherichia coli K12. 
Escherichia coli Kl2 

Bacteria; Proteobac teria; gamma sub divis ion; Enterobacteriaceae; 
Escherichia. 
1 (residues 1 to 427> 
Blattncr, F - R. , Plunkett, G 
Riley, M., collado-vides, J . 



Ill, Bloch, C.A., Pema,N.T., Burland,V., 
Glasner,J.D., Rode,C.K., Mayhew, G.F 



Kirkpatrick, H. A. , ' Goeden,M. A. , Rose, D . J . 



Laboratory of Genetics, 
Madison, WI 53706, USA. 
~ -262-2534 Fax: 



Gregor, J., Davis, n.w. 
Man, B. and shao,Y. 

The complete genome sequence of Escherichia coli K-12 
Science 277 (5331), 1453-1474 <1997> 
97426617 
9278503 

2 (residues 1 to 427> 
Blattner, F . R. 
Direct submission 

submitted (16-JAN-1997) Gup Plunkett III, 
university of Wisconsin, 445 Henry Mall, 
Email: ecolidgenetics.wisc.edu Phone: 608- 
608-263-7459 

3 (residues 1 to 427) 
Blattner, F . R. 

Direct Submission 

Submitted (02 — SEP — 1997) Guy Plunkett III, Laboratory of Genetics, 
University of Wisconsin, 445 Henry Mall, Madison, WI 53706, USA. 
Email: ecolidgenetics.visc.edu Phone.- 608-262-2534 Fax: 
608-263-7459 

4 (residues 1 to 427) 
Plunkett, G. III. 
Direct submission 

Submitted (13-0CT-1998) Laboratory of Genetics, university of 
Wisconsin, 445 Henry Mall, Madison, WI 53706, USA 

This sequence was determined by the E. coli Genome Project at the 
university of Wisconsin-Madison (Frederick R. Blattner, director). 
Supported by NIH grants HG00301 and kg 014 2 8 (from the Human Genome 
Project and HCHGR) . The entire sequence vas independently 
determined from E. coli K12 strain MG1655. Predicted open reading 
frames were determined using GeneMark software, kindly supplied by 
Mark Borodovsky, Georgia institute of Technology, Atlanta, OA, 
30332 [e-mail: mark@amber.gatech.edu]. Open reading frames that 
have been correlated with genetic loci are being annotated with CG 
site Nos., unique id nos. for the genes in the E. coli Genetic 
Stock Center (CGSC) database at Vale university, kindly supplied by 
Mary Berlyn. A public version of the database is accessible 
(http://cgsc.biology.yale.edu). Annotation of the genome is an 
ongoing task whose goal is to make the genome sequence more useful 
by correlating it with other data. Comments to the authors are 
appreciated, updated information will be available at the E . coli 
Genome Project's World Wide Web site 

(http://www.genetics.wisc.edu). *** The E . coli X.12 sequence and 
its annotations are periodically updated; this is version M54. no 
sequence changes. Annotation updates: updated gene identifications 
and products; all new functional assignments courtesy of Monica 
Riley; added promoters, protein binding sites, and repeated 
sequences described in reference 1. The unique numeric identifiers 
beginning with a lowercase 'b' assigned to each gene (protein- or 
RNA- encoding) are now designated as gene synonyms instead of 
labels. This should allow them to be searched for in Entrez as gene 




names . 
Method: 



FEATURES 

source 



Protein 



conceptual translation. 
L o c ation/ Qualif ie rs 
1. .427 

/organism="Escherichia coli K12" 

/strain="*12 M 

/sub strain="MGl655" 

/db_xref = " tax on: 83333" 

1. 427 

/function 2 " orf ; unknown" 

/produc t=" orf , hypothetical protein" 

1. .427 

/gene = "yeaH" 

/coded_by=" 1788078: 6385. . 7668" 
/ trans 1_ table = 11 

/note- M o427; This 4 27 aa ORF is 28 pet identical (4 3 gaps) 
to 327 residues of an approx. 312 aa protein YZDC_BACSU 
SW: P45742" 



ORIGIN 



mtwfidrrln gknksmvnrq rflrrykaqi kqsiseaink rsvtdvdsge svs iptedis 

epmfhqgrgg Irhrvhpgnd hfvqndrier p<igggggsgs gqgqasqdge gqdefvfqis 

121 kdeyldllfe dlalpnUcqn qqrqlteykt hragytangv paniswrsl qnslarrtam 

taakrrelha leenlaiisn sepaglleee rlrkeiaelr akiervpfid tfdlryknye 



1 
61 



181 
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FIG. 9 
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http: //victoria. inpharmatica.co.uk/~volker/BPD3target.html 




Target Mining 
Interface 




atica 



Select Your Query Sequence 



• Enter PDB accession number (e.g. 1 QMA): 



OR 



1A0Xl and chain (e-g.B): a\ 



m Enter one Swiss -Prot accession (e.g. PZ7504)or Gen Bank protein ID (e.g. CAB08761.1): 



Select Database 



Release: 



DEVF9 = BPD3n 



Apply Filters 



• Iteration Filter PSh BLAST matches to be excluded: 



None 



If you select e.g. "Matches detected during the first 3 iterations" these matches will be excluded from the 
report (using the first_PB_iter annotation). This allows you to focus on more remote homologous which have 
been detected after 4 or more PSI- BLAST iterations. Matches detected using PSI- BLAST with negative 
iterations or using Genome-Threader are not effected by this option. However, if one match is found during 1 
the first e.g. 3 PSI- BLAST iterations and by Genome-Threader it will be excluded. |[| 
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FIG. 12 




c The Pfam 

oJUlger Protein families database of alignments and HMMs 

{J \_/ 61111*6 Home I Keyword search | Protein search | ONA search | Browse Pfam | Taxonomy searchjHelp 




Results for gi|2367274|0)[AAC76768.1| 



There were no matches to Pf am- A (including borderline matches) for gil2367274 1 gbl AAC76768 .1 1 

Matches to Pfam-B 



j Domain [Start [End j Evafew iA^ment | 
jPfam-B 15204 j204 |408 j2.4e-108 |Mg I 



[427 midutt] 



Alignments of Pfam-B domains to best-matching Pfam-B sequence 

Format for fetching alignments to Pfam-B families: [Hypertext linked to swisspfam □ 



Querygil2367274lgblAAC76768.il/204-408 matching Pfam-B 15204 



YIEMECOLI 
gi| 2367274 |gb| AAC76768.lt 

OTJECOU 
gi| 2367274| gb|AAC76768H 

TIEM_ECOLI 
gi| 2357274 1 j^l AAC76768.1| 

gi|23$7274|g*|AAC76768.1| 
Yira.BCOU 
gi|23S7274 | gb| AAC76768.1| 



204 DILWJJ>PEIJttt6iraZYEmjaVEKQU.TYl^ 253 

DI1R1LPPE1AT1 ttmETETYRRLVERQLLTYRUttE SWIWn RPV 
204 DI1RLLPPELATL GITELEYEFYRRL VERQLLTYRLHGE SYREKVIERPV 253 

254 raOTDEQPRCPmCTOTSCSaOCTOQCAKmiJU^^ 303 

VXKDTDE R6PFIV CVDT 5 G Stt 6 6FHE Q C AKAP CLALHRI ALAEHRRCY 
254 YHKD7DE QPRSPFI V CVDT 56 SM 6 $FHE QCAXAF C LALBRIALAEflRRCT 303 

304 IML P STEIVRYEL S GP Q GIEQAIPJX S Q QFRG 6TDIJtf CFRAIMERI. Q SR 353 

IMLPSTEI7RYE15GPQ0IEQAIRrL5QQPR69TDIJlSCFRAIMERLQ5R 
304 IM1P STEIVRYEI S GP Q GIZ Q AIR7L 5 Q QFRG CTDLAS CPRAIMERL Q SR 353 

354 EWD ADAWT SDFIAQRLPDDVT SRVKEL QRVHQMRFHAV JW5AH6KP 61 403 

EOTDAD AWI SDPI AQRLPDDVT SKVXEt QRVKQXRFKAV AMSAK8KP 61 
354 EOTD AD AWI SDFI AQRLPDDVT SXVKEL QRVH QHRPKAV AttS AK6KP 01 403 

404 HRIFD 408 

BRIFD 
404 MRIPD 408 



Align to family 



is 



If yon dunk there is anything wrong with this script, please contact Pfam 



FIG. 13 
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.Bookmarks 



hftp^v^.ncbi.n!m.nih.gov:8CVentrez/queryjc9i 



LOCUS 
DEFIHITIOH 
ACCESSION 
PID 

VERSION 
DB SOURCE 
KEYWORDS 
SOURCE 

ORGANISM 



REFERENCE 
AUTHORS 



AAC76768 



4 27 aa 



BCT 



orf, hypothetical protein {Escherichia coli M2I. 
AAC76768 1 



01-DEC-2000 



g2367274 
AAC76768.1 CI: 
locus AE 00 04 51 



2367274 

Accession AE000451. 



TITLE 
JOURNAL 
MEDLINE 
PUBMED 
REFERENCE 
AUTHORS 
TITLE 
JOURNAL 



Escherichia coli K12. 
EscfterAyhia coli *1? 

Bacteria; Proteobacteria; gamma subdivision; Enterobacteriaceae : 

Escherichia. 

1 (residues 1 to 427) 

Blatt»er,F.R Plunkett, c. hi, Bloch,C.A., Pema,N.T., Burland,v. 
Rxlep,M., collado-Vides,J., Glasner, J . D. , Rode,C.R., Mayhev^.F' 
Gregor,J., Davis, H.W., Kirkpatrick,H. A. , 6oeden,B.A 
Man, B. and shao,T. 

The complete genome sequence of Escherichia coli X-12 

(53 - ' ' "" ' ■ -■ 



Rose, D.J. , 



REFERENCE 
AUTHORS 
TITLE 
JOURNAL 



REFERENCE 
AUTHORS 
TITLE 
JOURNAL 



ORIGIN 



Science 277 (5331), 1453-1474 (1337) 
3278lo3 7 

2 (residues 1 to 427) 
Blattner,F.R. 
Direct Submission 

Submitted (16-JAH-1997) cup Plunkett in, Laboratorp of Genetics, 
Universitp of Wisconsin, 445 Kenrp Bail, Madison, WI 53706, USA. 
Email: ecolidgenetics.visc.edu Phone: 608-262-2534 Fax - 
608-263-7459 

3 (residues 1 to 427) 
Blattxier, F R 
Direct Submission 

Submitted (02-SEP-1997) Gup Plunkett III, Laboratorp of Genetics, 
universitp of Wisconsin, 445 Henrp Mall, Madison, WI 53706, USA 
Email: ecolidgenetics.visc.edu Phone: 608-262-2534 Fax- 
608-263-7459 

4 (residues 1 to 427) 
Plunkett, G. m. 
Direct Submission 

Submitted (13-0CT-1998) Laboratorp of 6enetics, University of 
Wisconsin, 445 Henrp Mall, Madison, WI 53706, USA 
This sequence was determined bp the E. coli Genome Project at the 
university of Wisconsin-Madison (Frederick R. Blattner, director). 
Supported bp NIH grants KG00301 and HG01428 (from the Human Genome 
Project and NCHGR) . The entire sequence vas independently 
determined from E. coli K12 strain MG1655. Predicted open reading 
frames were determined using GeneMark software, kindly supplied by 
™2L B ? rodov ^ kp ' ?f° rs £ A Institute of Technology, Atlanta, GA, 
30332 [e-mail: markaamber.gatech.edu). open reading frames that 
have been correlated with genetic loci are being annotated with cg 
Site Nos., unique ID nos. for the genes in the E. coli Genetic 
Stock center (CGSC) database at Vale University, kindly supplied by 
Mary Berlyn. A public version of the database is accessible 
(http://cgsc.biologp.pale.edu). Annotation of the genome is an 
ongoing task whose goal is to make the genome sequence more useful 
by correlating it with other data. comments to the authors are 
appreciated. Updated information will be available at the E ooli 
Genome Project's worldwide Web site 

(http://www.genetics.wisc edu). *** The E. coli K12 sequence and 
its annotations are periodically updated; this is version M54. No 
sequence changes. Annotation updates: updated gene identifications 
and products; all new functional assignments courtesy of Monica 
Riley; added promoters, protein binding sites, and repeated 
sequences described in reference 1. The unique numeric identifiers 
beginning with a lowercase 'b' assigned to each gene (protein- or 
RNA-enooding) are now designated as gene synonyms instead of 
labels. This should allow them to be searched for in Entrez as gene 

conceptual translation. 
L o c a tion/ Qualif ie r s 
1..427 

/organism^ "Escherichia ooli Kl2" 
/strain="K12" 
/sub_strain«"M0l655" 
/db xref-"taxon: 83333" 

1- .427 

/function**- orf ; unknown" 
/produot=" orf , hypothetical protein" 

1 . .427 

/gene**"pieM" 

/coded_bp-" complement (2367272: 5249. . 6532)* 
/translatable =H 

/note* - f427; sequence change joins ORFs pieD and yiett from 
earlier version" 

1 mrsrlkdary ppelteevmc yqqsqllstp qfivqlpqil dllhrinspw aeqarqlvda 
61 nstitsalht lflqrwrlsl ivqattlnqq Ueeercqll sevqermtls gqlepiladn 
Hi n ^ a ? grAvdm JASrqikrgdy qlivkygefl neqpelkrla eqlgrsreak siprndaqme 
181 tfrtmvrepa tvpeqvdglq qsddllrllp pelatlcite lmtfnrrtv »v{,i»torih 
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FIG. 14A 
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FIG. 16B 
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FIG. 16A 




laox: MG400 



FIG. 17 
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jhttp: //victoria. inpharmatica.co.uk/~volker/BPD3target.html 




Target Mining 
Interface 




atica 



Select Your Query Sequence 



• Enter PDB accession number (e.g. 1 QMA): 



OR 



flJLM and chain (e.g. B): 



• Enter one Swiss- Prot accession (e.g. P27504) or GenBank proteinID (e.g. CAB087B1 .1 ): \ 



Select Database 



Release: 



DEVF9 = BPD3D 



Apply Filters 



• Iteration Filter PSI- BLAST matches to be excluded: 



None 



□ 



If you select e.g. "Matches detected during the first 3 iterations" these matches will be excluded from the report 
(using the firsLPBjter annotation). This allows you to focus on more remote homologous which have been 
detected alter 4 or more PSI- BLAST iterations. Matches detected using PSI- BLAST with negative iterations or 
using Genome-Threader are not effected by this option. However, if one match is found during the first e.g. 3 
PSI- BLAST iterations and by Genome-Threader it will be excluded. 

• Filter for the following SPECIES: 



□Homo sapiens 



| DRattus non/ecicus (Rat) 



□Mus musculus (Mouse) 



□Danio rerio (Zebra fish) 



FIG. 18A 
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l^oi^ Si^P fc . L 3ca ; io n:;^|"http: //London-bridge. inpharmatica.co.uk/cgi-bin/volker/getTargetBPD3.pl 



2) 81 additional hits identified by both, Genome Threader and PSI-BLAST: 

Combined Genome Thread* r and PSI - Blait output : FSI - BLAST valm are ituwm in maroon! 
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FIG. 20 



File; gdrt ' View Go _ Communicator 





,y Forward^ R eload t Home ^ Search Netscape grirrt Security ' 



jpj httpr//www. sanger. ac. uk/cgi- bi n/Pf am/n ph-search .cgT 




c The Pfam 

( oanger Protein families database of alignments and HMMs 

V^CnirC Home | Keyword search| Protein search |DNA search | Browse Pfam | Taxonomy search | Help 




Results for gi|133251|sp|PlOT55|RO60^HUMAN 

There wereno matches to Pfam- A (including borderline matches) for gill33251lsplP10155(RO60_HUMAN 

Matches to Pfam-B 



Domain jStart [End j Evahie ^Alignment 



i Pfam-B 8344 jl [194 j23e-103 j Align 



i Pfam^B 10162 1195 |538 jl.8e-165 jMSL 



] [538 Ksidues] 



Alignments of Pfam-B domains to best-matching Pfam-B sequence 



Format for fetching alignments to Pfam-B families: Hypertext linked to SWiSSpfaitt □ 



Query gill 33251 lsplPim55IRO60_HUMAN/l -194 matching Pfam-B 8344 



Q92787 1 HEE 5 VN QM QPLHEKQI AN S QD GYVWQVT D MNRLHRFL CF G SB G GT YYXKE 50 
MEE S VN Q« QPLHEKQI AN S QD GYVWQVTDMNRLMRFL CFGSEG GTYYIKE 
gi| 133251 1 sp|Pl0155|R0S0_HUMAH 1 WEE S VN QW Q PLN EK.QI AN S QD GYVWQVTDMNRLMRFL CFGSEG GTYYIKE 50 

Q92787 51 QKLGLEHAEALIRI.XEDCRGCBVIQEIK.SPSQEGKrrKQEPMLPAI.AICS 100 
QKL GLEN AE ALIRLIED GRG CEVT QEXKSF S QE GRTTKQEPMLF ALAI C S 
gi| 133251| Sp|P 10155 |RO60_HUMAM 51 QKL GLEN ABALIRLIED GRG CEVI QEXKSF S QE GRTTKQEPMLF AL AX C S 100 

Q92787 101 Q C SDI STKQAAFKAV SEV CRIPTMLFTFI QFKKDLKE SMKC GMWGRALRK 150 
Q C SDI STKQAAFKAV SEV CRIPTMLFTFI QFKKDLKE SMKC GMWGRALRK 
gi| 133251| Sp | P 10155 1 R0S0_HUMAN 101 Q C SDI STKQAAFKAV SEV CRIPTMLFTFI QFKKDLKE SMKC GMWGRALRK 150 

Q92787 151 AIADWYNEKGGMALALAVTKYKQRN GWSMKDLLRL SHLKP S SE 6 194 
AI ADWY NEKG CM ALAL AVTKYKQRN GWSMKDLLRL SHLKP S SE G 
gi| 133251 |sp|Pl0155|R060_HUMAN 151 AIADWYNEKGGMALALAVTKYKQRH GWSMKDLLRL SHLKP SSEG 194 



| Align to family 



Query gill33251lsplP10155IRO60_HUMAN/l 95-538 matching | 



008848 195 LAXVTKYITKGWKEVHEEYKEKAL 5VEAEKLLKVLEAVEKVKRTKDDLEV 244 
LAIVTKYITKGVKEVME VKEKALSVE EKLLKYLEAVEKVKRTKD+LEV 
gi| 133251 |sp|P1015S|R060_HUMAN 195 LAIVTKYITKGWKEVMELVKEKALSVETEKLLKYLEAVEKVKRTKDELEV 244 

008848 245 IMLIEEMQLVREMLLTHMLKSKEVWKALLQEMPLTALLRNLGKWTANSVL 294 
TKI.TT!1?M+1.WREMLLTNMLKSKEVWKALL0EMPLTALLRNLGK«TANSVL 

008848 295 EPGNSEVSLICEKLSNEKLLKKARIMPFMVLIALETYRA6MGLRGKLW?I 344 
EPGHSEVSL+ CEKL NEKLLKKARIMPFM+LIALETY+ GHGLRGKLKW 
gi| 1332511 sp | Pl0155|R0S0_MUMAN 295 EPGNSEV SLVCEKL CNEKLLKKARIMPFMILIALETYKT GMGLRGKLKWR 344 

008848 345 PDKDIL QALDAAF YTTFKTVEPT GKRFLLAVDV SASMNQRALGSVLNAST 394 
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FIG. 21 



File, Edrt^yiev a g,o,, Commun.cato; , 



u For war d ' R ef &ad / He 



Location: [ http: //www.ncbi.nlm^^ 



L0CU5 

DEFIHITI ON 

AC CE SSI OH 
PID 

VERSION 
DB SOURCE 



KEYWORD S 



SOURCE 

ORGANISM 



REFERENCE 
AUTHORS 
TITLE 
JOURNAL 
MEDLINE 
REMARK. 

REFERENCE 
AUTHORS 
TITLE 

JOURNAL 
MEDLINE 
REMARK 
COMMENT 



ORIGIN 



R060_HUMAH 538 aa PRI 01 -FEB -199 6 

60 KD R0 PROTEIN (60 KD RIB ONUCLE OPROTEIN RO> <R0RNP> (SJOGREN 
SYNDROME TYPE A ANTIGEN (SS-A)>. 
P1015S 
gl33251 

P1015S GI: 133251 

swissprot: locus R060_HUMAN, Accession P10155 ; 
class: standard, 
created: Mar 1, 1989. 
sequence updated: Mar 1, 1989. 
annotation updated: Feb 1, 199$. 

xrefs: gi: gi: 177782 . gi; gi: 177783. gi: gi: 387656. gi: gi: 
337657. gi: gi: 86722. gi: gi. 107626 

xrefs (non-sequence databases): mim 600063 . mim 234700. PROSITE 
PS00030 

Ribonucleoprotein; RN A -bin ding; Sys temic lupus erythematosus; 
Antigen, 
human. 

Homo sapiens 

Eukaryota; Metazoa; Chordata; craniata; Vertebrata; Mammalia; 
Eutheria; primates; Catarrhini; Hominidae; Homo. 

1 (residues 1 to 538) 

Deutsche r, S. L. , Karley, J.B. and Keene, J . D 

Molecular analysis of the 6 0-kDa human Ro ribonucleoprotein 
Proc. Natl. Acad. Sex. U.S.A. 85 (24), 9479-9483 (1988) 
89071722 

SEQUENCE FROM N.A 

2 (residues 1 to 538) 

Ben-Che trit, E . , Gandy,B.J., Tan, E . M . and Sullivan, K. F . 
Isolation and characterisation of a cDNA clone encoding the 60-kD 
component of the human ss-A/Ro ribonucleoprotein auto antigen 
J. Clin, invest. 83 (4), 1284-1292 (1989) 
89198084 

SEQUENCE FROM N.A. 

This SWISS-PROT entry is copyright. It is produced through a 
collaboration between the Swiss institute of Bio informatics and 
the embl outstation - the European Bio informatics Institute. 
The original entry is available from http://www.expasy.ch/sprot 
and http://www.ebi.ac.ux/sprot 

(FUNCTION) UNKNOWN. 

[SUBUNTT] RO SMALL RIB ONUCLE OPROTEIN S CONSIST OF FOUR SMALL RNA 
MOLECULES OF 85-112 NT, EACH OF VHXCH IS COMPLEXES WITH A 60 KD 
PROTEIN. RO RNPS MAY ALSO CONTAIN AN ADDITIONAL 52 KD PROTEIN. 
[ SUBCELLULAR LOCATION J CYTOPLASMIC. 

[DISEASE] SERA FROM PATIENTS WITH SYSTEMIC LUPUS ERYTHEMATOSUS 
OFTEN CONTAIN ANTIBODIES THAT REACT WITH THE NORMAL CELLULAR RO 
PROTEIN AS IF THESE ANTIGEN WAS FOREIGN. 

{SIMILARITY} CONTAINS 1 RNA RECOGNITION MOTIF (RHP). 
[SIMILARITY] STRONG, TO XENOPUS 60 KD RO PROTEIN. 
Location/Qualifiers 
1. . 538 

/organism= '•Homo sapiens** 
/db_xref=" tax on. 9606" 
1. . 538 

protein a. . 538 

/product^" 60 KD RO PROTEIN" 
Region 93.-98 

/ re g io n_n am e=" Domain" 

/no te = " RNA -BINDING (RNP2) (BY SIMILARITY) . H 
Region 124.. 131 

/re gion_name = "D omain" 

/no te=" RNA -BINDING (RNFl) (BY SIMILARITY) . " 
Region 239 

/ re gion_name = m c onf lie t " 

/note = "K — > R (IN REF. 2) . " 
Region 515. .538 

/ re gion_name =" C onf lie t" 

/note-" GMLDMCGFDTGALDVIRNFTLDMI -> AL QNTLLNKSF (IN REF. 

2)." 



FEATURES 

source 



// 



1 meesvnqmqp Inekqiansq dgyvwqvtdm nrlhrflcfg seggtyyike qklglenaea 
61 lirliedgrg cevigeiksf sqegrttkqe pmlfalaics qcsdistkqa aflcavsevcr 

121 ipthlftfiq fkkdlkesmk cgmwgralrk aiadwyneJcg gmalalavtk ykqrngwshk 

181 dllrlshlkp sse glaivtk yitkgwkevh elykekalsv etekllxyle avekvkrtkd 

241 elevihliee hrlvrehllt nhikskevwk allqemplta llmlgkmta nsvlepgnse 

301 vslvceklcn ekllkkarih pfhilialet yktghglrgk lkwrpdeeil kaldaafykt 

361 fktveptgkr fllavdvsas mnqrvlgsil nastvaaamc mwtrtekds ywafsdemv 

4 21 pcpvttdmtl qqvlmamsqi paggtdcslp miwaqktntp advf ivf tdn etfaggvhpa 

481 ialreyrkkm dip ak live g mtsngftiad pddrgmldmc gf dtgaldvi mf tldmi 
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